Ranking

🏠 ⮐ Artificial Intelligence ⮐ Machine Learning ⮐ Supervised Learning ⮐

Core Concept

Ranking is a supervised learning task where the goal is to order a set of items by relevance, preference, or utility rather than to assign absolute scores or single labels. Given a context (e.g. a search query, a user profile) and a collection of items (e.g. documents, products, recommendations), a ranking model learns to produce a permutation or ordering that maximizes agreement with desired preferences—typically that more relevant or preferred items appear higher. Unlike regression (which predicts a magnitude) or classification (which predicts a category), ranking focuses on relative ordering: the model need not assign calibrated scores as long as the order is correct. This makes ranking central to search engines, recommendation systems, and any application where presentation order matters.

The Learning Process

The ranking process involves training on examples where the context, the set of items, and (explicitly or implicitly) the preferred ordering or relevance labels are known. Training signals may be full orderings (item A above B above C), pairwise preferences (A preferred to B), relevance grades (e.g. 0–4), or clicks/feedback from users. Learning algorithms optimize objectives that reward putting relevant items above non-relevant ones—e.g. pairwise loss, listwise loss, or a surrogate for an information-retrieval metric like NDCG or MAP. During inference, the model scores items (or item–context pairs) and sorts by score to produce the final ranking; the scores themselves may be uncalibrated as long as the order is correct.

Learning Paradigms

_{How ranking models are trained with respect to the granularity of the supervision}

Pointwise – Treat each item independently: predict a relevance score or grade per item (regression or classification). Ranking is then obtained by sorting by predicted score. Simple and reuses standard regression/classification, but ignores the list structure and relative nature of relevance.
Pairwise – Treat pairs of items: learn a binary preference (item A should be ranked above item B or not). The model is trained to correctly order pairs; inference scores items and sorts. Captures relative preference directly but scales with the number of pairs and does not optimize list-level metrics directly.
Listwise – Treat the entire list as the unit: optimize a loss defined over the full permutation or over a list-level metric (e.g. NDCG, MAP). Closer to the end goal and can yield better rankings, but requires more sophisticated optimization and often approximate or surrogate losses.

Evaluation Metrics

_{Methods for measuring ranking quality}

NDCG (Normalized Discounted Cumulative Gain) – Discounts relevance by position (higher positions count more); normalized by ideal ranking. Standard in search and recommendation; handles graded relevance and position sensitivity.
MAP (Mean Average Precision) – Average of precision-at-k for each relevant item; emphasizes ranking relevant items as high as possible. Well-suited to binary relevance.
MRR (Mean Reciprocal Rank) – Reciprocal of the rank of the first relevant item; useful when only the top result or first hit matters.
Precision@k / Recall@k – Fraction of relevant items in the top k; simple but ignores order within the top k.
Pairwise accuracy / Kendall τ – Fraction of pairs correctly ordered; measures consistency of the predicted order with the ground-truth order.

Common Challenges

Label scarcity and noise: relevance and preference labels are often sparse (few clicks, few ratings) and biased (position bias, selection bias), requiring careful collection and debiasing. Scale: the item set can be huge (e.g. entire corpus); ranking is typically done in two stages—retrieval (candidate set) then ranking (re-rank top candidates)—and the model must be efficient at inference. Distribution shift: training distribution (e.g. logged data) may differ from deployment (live distribution); offline metrics may not reflect online performance. Diversity and fairness: beyond relevance, rankings may need to balance diversity, coverage, and fairness across items or providers. Cold start: new items or new users have little or no interaction history, making it hard to rank them well.

Ranking Algorithms
Ranking algorithms differ in whether they take a pointwise, pairwise, or listwise view of the problem, in the loss they optimize, and in the model used to score items (linear, tree-based, or neural). Choice depends on data (grades vs clicks vs full orderings), scale, and whether list-level metrics like NDCG are targeted directly.

Pointwise (Regression/Classification) – Predict a relevance score or grade per item with a regressor or classifier; rank by predicted score. Reuses standard ML; simple but ignores relative structure and does not optimize ranking metrics directly.

Based on: Regression or classification (one model per item or shared scoring function)

Method Group: Pointwise learning-to-rank

https://scikit-learn.org/stable/supervised_learning.html

RankNet – Pairwise approach: model the probability that item A is more relevant than B with a neural network; train with cross-entropy on pairs. Differentiable; later generalized to listwise (LambdaRank, LambdaMART).

Based on: Pairwise preference probability (neural scoring + cross-entropy on pairs)

Method Group: Pairwise learning-to-rank > Neural

https://www.microsoft.com/en-us/research/publication/learning-to-rank-using-gradient-descent/

LambdaRank – Listwise: uses gradient weighting (lambdas) derived from the change in NDCG when swapping a pair, so that optimizing the surrogate loss improves NDCG. No explicit loss over the full list; gradients are defined per pair but weighted by list-level impact.

Based on: RankNet (lambda gradients from NDCG)

Method Group: Listwise learning-to-rank > Neural

https://www.microsoft.com/en-us/research/publication/from-ranknet-to-lambdarank-to-lambdamart-an-overview/

LambdaMART – Listwise: LambdaRank with MART (Multiple Additive Regression Trees), i.e. gradient boosting with ranking-specific gradients. Combines tree-based strength with direct optimization for NDCG; widely used in search and competitions.

Based on: LambdaRank + MART / gradient boosting

Method Group: Listwise learning-to-rank > Tree-based

https://www.microsoft.com/en-us/research/publication/from-ranknet-to-lambdarank-to-lambdamart-an-overview/

https://lightgbm.readthedocs.io/en/stable/Features.html#lambdarank

ListNet – Listwise: model permutation probability with top-one probability (softmax over items); train with cross-entropy against the ground-truth distribution over rankings or top item. Differentiable end-to-end.

Based on: Plackett–Luce model / top-one probability (neural scoring + listwise cross-entropy)

Method Group: Listwise learning-to-rank > Neural

https://papers.nips.cc/paper/2007/hash/ca46c1b9512a7a8315fa3c5a946e8265-Abstract.html

ListMLE – Listwise: maximum likelihood under the Plackett–Luce model for the full permutation; train by maximizing likelihood of the correct ordering. Listwise and statistically principled.

Based on: Plackett–Luce (listwise MLE)

Method Group: Listwise learning-to-rank

https://www.microsoft.com/en-us/research/publication/listwise-approach-to-learning-to-rank-theorem-and-algorithm/

RankSVM – Pairwise: learn a linear (or kernel) scoring function so that relevant items score higher than non-relevant ones; formulated as classification of preference pairs with margin. Interpretable and works well with good features.

Based on: SVM (classification on preference pairs)

Method Group: Pairwise learning-to-rank > Linear / kernel

https://www.cs.cornell.edu/people/tj/publications/joachims_02a.pdf

GBRank / GBDT for ranking – Pointwise or pairwise gradient boosting; trees predict relevance or preference. LightGBM and XGBoost support ranking objectives (e.g. NDCG, pairwise); scalable and effective with tabular features.

Based on: Gradient boosting (ranking loss or pairwise)

Method Group: Tree-based > Ensemble

https://lightgbm.readthedocs.io/en/stable/Features.html#lambdarank

https://xgboost.readthedocs.io/en/stable/tutorials/learning_to_rank.html